USAARL  Report  No.  2009-05 


The  Effect  of  Event  Rarity  on  the 
Perception  of  Correlationally 
Indeterminate  Data 

By  Amanda  M.  Kelley 
(USAARL) 

Richard  B.  Anderson 


Warfighter  Performance  and  Health  Division 
February  2009 


Approved  Tor  public  release,  distribution  unlimited. 


U-S-  Army 

Aeromedical  Research 
Laboratory 


Notice 


Qualified  requesters 

Qualified  requesters  may  obtain  copies  from  the  Defense  Technical  Information  Center  (DTIC), 
8725  John  J.  Kingman  Road,  Suite  0944,  Fort  Belvoir,  Virginia  22060-6218.  Orders  will  be 
expedited  if  placed  through  the  librarian  or  other  person  designated  to  request  documents  from 
DTIC. 

Change  of  address 

Organizations  receiving  reports  from  the  U.S.  Army  Aeromedical  Research  Laboratory  on 
automatic  mailing  lists  should  confirm  correct  address  when  corresponding  about  Laboratory 
reports. 

Disposition 

Destroy  this  document  when  it  is  no  longer  needed.  Do  not  return  it  to  the  originator. 
Disclaimer 


The  views,  opinions,  and/or  findings  contained  in  this  report  are  those  of  the  author(s)  and 
should  not  be  construed  as  an  official  Department  of  the  Army  position,  policy,  or  decision, 
unless  so  designated  by  other  official  documentation.  Citation  of  trade  names  in  this  report  does 
not  constitute  an  official  Department  of  the  Army  endorsement  or  approval  of  the  use  of  such 
commercial  items. 

Human  Use 


Human  subjects  participated  in  this  study  after  giving  their  free  and  informed  voluntary  consent. 
Investigators  adhered  to  Army  Regulation  70-25  and  USAMRMC  Regulation  70-25  on  use  of 
volunteers  in  research. 


REPORT  DOCUMENTATION  PAGE 


Form  Approved 
OMB  No.  0704-0188 


The  public  reporting  burden  for  this  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data  sources, 
gathering  and  maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  this  collection  of 
information,  including  suggestions  for  reducing  the  burden,  to  Department  of  Defense,  Washington  Headquarters  Services,  Directorate  for  Information  Operations  and  Reports  (0704-0188), 
1215  Jefferson  Davis  Highway,  Suite  1204,  Arlington,  V A  22202-4302.  Respondents  should  be  aware  that  notwithstanding  any  other  provision  of  law,  no  person  shall  be  subject  to  any 
penalty  for  failing  to  comply  with  a  collection  of  information  if  it  does  not  display  a  currently  valid  OMB  control  number. 

PLEASE  DO  NOT  RETURN  YOUR  FORM  TO  THE  ABOVE  ADDRESS. 


1.  REPORT  DATE  (DD-MM-YYYY)  2.  REPORT  TYPE 

18-02-2009  Final 


4.  TITLE  AND  SUBTITLE 

The  Effect  of  Event  Rarity  on  the  Perception  of  Correlationally  Indeterminate 
Data 


3.  DATES  COVERED  ( From  -  To) 


5a.  CONTRACT  NUMBER 


5b.  GRANT  NUMBER 


5c.  PROGRAM  ELEMENT  NUMBER 


6.  AUTHOR(S) 

Amanda  M.  Kelley 
Richard  B.  Anderson 


5d.  PROJECT  NUMBER 


5e.  TASK  NUMBER 


5f.  WORK  UNIT  NUMBER 


7.  PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES) 

US  Army  Aeromedical  Research  Laboratory 

P.O.  Box  620577 

Fort  Rucker,  AL  36362-0577 


8.  PERFORMING  ORGANIZATION 
REPORT  NUMBER 

USAARL  2009-05 


9.  SPONSORING/MONITORING  AGENCY  NAME(S)  AND  ADDRESS(ES) 

US  Army  Medical  Research  and  Materiel  Command 

504  Scott  Street 

Fort  Detrick,  MD  21702 


10.  SPONSOR/MONITOR'S  ACRONYM(S) 

USAMRMC 


11.  SPONSOR/MONITOR'S  REPORT 
NUMBER(S) 


12.  DISTRIBUTION/AVAILABILITY  STATEMENT 

Approved  for  public  release,  distribution  unlimited. 


14.  ABSTRACT 

Previous  research  has  indicated  that  events  that  are  rare  are  more  informative  than  common  events.  The  present  study  manipulated 
event  rarity  through  social  stereotypes  to  evaluate  event  rarity’s  role  in  the  perception  of  correlationally  indeterminate  data.  Social 
stereotypes  were  used  as  a  means  to  manipulate  expectations  about  which  observations  would  be  considered  rare  and  which 
common.  Participants  were  presented  with  a  correlationally  indeterminate  sample  and  were  asked  to  rate  the  correlational 
relationship  in  the  population  from  which  the  sample  was  drawn.  The  results  did  not  support  the  event  rarity  hypothesis  but  were 
consistent  with  confirming  hypothesis  testing  behavior.  Further  research  is  ongoing  to  evaluate  what  factors  may  influence 
differential  behavior  (e.g.,  preference  for  common  over  rare  observations  and  vice  versa)  in  the  perception  of  correlationally 
indeterminate  data. 


15.  SUBJECT  TERMS 

causal  judgment,  correlation  detection,  social  beliefs 


16.  SECURITY  CLASSIFICATION  OF: 

a.  REPORT 

b.  ABSTRACT 

c.  THIS  PAGE 

UNCLAS 

UNCLAS 

UNCLAS 

17.  LIMITATION  OF  18.  NUMBER  19a.  NAME  OF  RESPONSIBLE  PERSON 


ABSTRACT 


OF 

PAGES 


Loraine  Parish  St.  Onge,  PhD 

19b.  TELEPHONE  NUMBER  (Include  area  code) 

334-255-6906 


Standard  Form  298  (Rev.  8/98) 
Prescribed  by  ANSI  Std.  Z39.18 


11 


Acknowledgements 


The  authors  would  like  to  express  their  sincere  gratitude  to  the  following  people  for  their 
contributions  to  this  project. 

•  Ms.  Elizabeth  Stokes  for  help  with  administrative  matters. 

•  Dr.  Loraine  St.  Onge  for  her  editorial  assistance. 

•  Anthony  Lauricella  for  help  with  the  data  collection. 

•  Dr.  Jennifer  Gillespie  for  help  with  the  data  collection. 

•  Lance  Jones  for  help  with  the  data  collection. 


iv 


Table  of  contents 

Page 

Introduction . 1 

Military  significance . 1 

Background . 1 

Research  questions . 6 

Symmetric  variables . 6 

Correlationally  indetenninate  data . 6 

Event  rarity  manipulation . 6 

Research  objective . 7 

Methods . 7 

General . 7 

Participants . 7 

Procedure . 7 

Group  Membership . 8 

Results . 8 

Discussion . 13 

Future  studies . 14 

Conclusions . 14 

References . 15 

Appendix.  Trial  sample . 16 

List  of  figures 

1 .  A  contingency  table  representing  the  components  of  delta  P  applied  to  a  causal  scenario  . 2 


v 


Table  of  contents  (continued) 

List  of  figures  (continued) 

Page 

2.  The  mean  relationship  ratings  as  a  function  of  the  A:B  or  C:D  ratio  of  cell  frequencies  in 

Kelley,  Anderson,  &  Doherty  (2007) . 4 

3.  Data  from  Kelley  (2007) . 5 

4.  Contingency  table  describing  variables  used  in  study  . 8 

5.  Mean  ranks  for  each  contingency  table  cell . 9 

6.  Mean  relationship  ratings  when  Cell  A  is  the  rare  observation . 11 

7.  Mean  relationship  ratings  when  Cell  A  and  Cell  D  observations  are  equally  likely . 12 

8.  Mean  relationship  ratings  when  Cell  D  is  the  rare  observation . 12 


vi 


Introduction 


The  ability  to  detect  relationships  in  the  environment  is  essential  and  underlies  other 
cognitive  processes  such  as  categorization  and  stereotype  formation.  Crocker  (1981)  described 
correlational  knowledge  as  essential  to  our  ability  to  “explain  the  past,  control  the  present,  and 
predict  the  future”  (p.  272).  There  is  a  large  literature  investigating  how  people  detennine 
correlational  and  causal  relationships.  Two  prevailing  approaches  to  understanding  this  ability 
are  inferential  and  traditional.  By  an  inferential  approach,  people  attempt  to  “determine  the 
likelihood  that  there  is  a  relationship  between  the  variables”  (McKenzie  &  Mikkelson,  2007).  In 
contrast,  the  traditional  perspective  asserts  that  people  summarize  the  infonnation  and 
observations  available  to  them.  Some  recent  studies  (Griffiths  &  Tenenbaum,  2005;  Kelley, 
Anderson,  &  Doherty,  2007;  McKenzie  &  Mikkelson,  2007)  have  focused  on  demonstrating 
evidence  for  an  inferential  approach  over  the  traditional  approach  in  that  the  inferential  approach 
can  explain  findings  in  the  correlation  and  causation  perception  literature  that  the  traditional 
viewpoint  cannot. 

The  current  study  evaluated  causal  judgments  in  the  context  of  social  beliefs.  Ongoing  work 
at  USAARL  is  evaluating  biases  and  errors  in  causal  judgment  in  Soldiers  after  periods  of  sleep 
deprivation.  It  is  hypothesized  that  Soldiers  who  are  sleep  deprived  will  show  strong  dependence 
on  previous  beliefs  (including  social  beliefs)  to  fonn  a  present  judgment.  Overweighting  prior 
beliefs  and  knowledge  has  been  shown  to  significantly  increase  the  likelihood  of  judgment  errors 
in  predicting  future  events.  Thus,  understanding  the  role  that  prior  beliefs  have  in  causal 
judgment  under  “normal”  conditions  allows  for  comparison  to  performance  under  conditions  of 
stress  (or,  specifically,  sleep  deprivation).  The  ongoing  work  was  designed  to  test  predictions 
made  by  the  event  rarity  hypothesis  and  positive  test  strategy  as  a  result  of  the  outcome  of  this 
study. 


Military  significance 

Causal  judgment  and  covariation  detection  is  important  to  military  operations  such  as 
intelligence  analysis  (Heuer,  1999).  If  these  abilities  are  compromised  then  soldiers  and  other 
military  personnel  are  more  likely  to  make  potentially  major  errors  in  judgment  such  as  accurate 
prediction  and  precautionary  actions.  It  is  predicted  that  under  situations  of  high  stress,  causal 
judgments  weigh  heavily  on  prior  expectations  and  beliefs  as  suggested  by  the  adaptive 
component  to  the  inferential  approach  and  McKenzie  and  Mikkelson’ s  (2007)  event  rarity 
theory.  It  should  be  noted  that  this  study  was  conducted  at  Bowling  Green  State  University  as  a 
follow-up  to  the  first  author’s  doctoral  program  and  was  not  a  USAARL  project.  The  study 
serves  as  a  basis  for  current  USAARL  projects  conducted  by  the  primary  author. 

Background 

Typically,  observations  in  a  covariation  assessment  task  of  dichotomous  variables  are 
summarized  using  a  contingency  table  which  has  four  cells;  Cell  A  is  the  presence  of  both 
variables,  Cell  B  and  Cell  C  correspond  to  the  presence  of  one  variable  and  absence  of  the  other, 


1 


and  Cell  D  is  the  absence  of  both  variables  (figure  1).  The  cells  of  the  table  are  used  to  calculate 
the  generally  accepted  measure  of  correlation  between  dichotomous  variables,  delta  P,  the 
formula  for  which  is  shown  in  Equation  1 . 

AP  =  A/( A+B)  -  C/(C+D)  ( 1 ) 


EFFECT 

Present  Absent 

Present 

CAUSE 

Absent 


A 

B 

C 

D 

Figure  1 .  A  contingency  table  representing  the  components  of  delta  P  applied  to  a 
causal  scenario.  Each  cell  represents  the  frequency  of  observations. 

A  number  of  studies  have  shown  that  people  find  the  cells  to  be  of  unequal  importance  in 
covariation  assessments  such  that  Cell  A  is  the  most  important,  followed  by  Cell  B  and  Cell  C, 
and  Cell  D  is  the  least  important;  A  >  B  >  C  >  D  (e.g.,  Levin,  Wasserman,  &  Kao,  1993).  While 
this  finding  has  been  discussed  previously  as  evidence  that  non-nonnative  processes  are 
occurring  (e.g.,  Kao  &  Wassennan,  1993),  McKenzie  and  Mikkelson  (2007)  are  the  first,  as  far 
as  the  authors  are  aware,  to  use  an  inferential  approach  to  explain  this  finding.  Specifically, 
McKenzie  and  Mikkelson’ s  event  rarity  theory  argues  that  probabilistically  rare  observations  are 
more  informative  to  the  perceiver  than  “common”  observations.  Rarity,  in  this  case,  is  defined  in 
terms  of  log  likelihood  ratios  (i.e.,  the  ratio  of  the  probability  of  an  observation  given  a 
conelated  population  to  the  probability  of  an  observation  given  an  uncorrelated  population;  see 
also  Anderson,  1990).  In  simpler  terms,  people  hold  beliefs  or  expectations  about  the  occurrence 
and  non-occurrence  of  events  in  the  environment.  The  occurrence  of  an  event  is  rare  if  one’s 
expectations  indicate  that  the  non-occurrence  of  the  event  is  more  likely.  Alternatively,  the  non¬ 
occurrence  of  an  event  is  rare  if  your  expectations  indicate  the  occurrence  of  the  event  is  more 
likely.  If  the  occurrence  (or  presence)  of  two  events  are  less  likely  than  the  non-occurrence  (or 
absence)  of  those  events,  then  the  observation  of  co-occurrence  is  the  rarest  observation  possible 
(i.e.,  given  that  the  other  possibilities  are  co-nonoccurrence,  and  the  occurrence  of  one  event  and 


2 


not  the  other)  and,  ultimately,  the  most  informative.  To  illustrate,  consider  the  following 
scenario:  McKenzie  and  Mikkelson  used  the  variables  “mental  health”  and  “high  school 
graduate”  the  levels  of  which  were  “presence”  and  “absence.”  Based  on  prior  beliefs  about  the 
environment,  one  arguably  should  expect  a  randomly  selected  person  in  the  population  to  be  both 
mentally  healthy  (mental  health  -  present)  and  a  high  school  graduate  (high  school  graduate  - 
present).  In  this  case,  co-occurrence  is  expected,  thus,  common,  and  co-nonoccurrence  (i.e.,  the 
randomly  selected  person  is  neither  mentally  healthy  or  a  high  school  graduate)  is  unexpected, 
thus  rare.  McKenzie  and  Mikkelson  argue  that  if  event  rarity  is  the  driving  force  behind  a  bias 
for  Cell  A  information,  then  a  Cell  D  bias  should  be  demonstrated  when  the  variable  level  of 
absence  is  less  common  than  the  presence  of  the  variables  (i.e.  it  is  more  common  for  an  event  to 
occur  than  for  that  event  not  to  occur).  In  other  words,  they  hypothesized  that  by  manipulating 
the  rarity  of  presence  and  absence  of  events,  a  preference  for  Cell  A  would  be  reversed  to  that  for 
Cell  D. 

McKenzie  and  Mikkelson  (2007)  tested  the  event  rarity  hypothesis  by  presenting 
participants  with  scenarios  in  which  “presence”  was  rare  as  well  as  scenarios  in  which  “absence” 
was  rare.  They  hypothesized  that  in  situations  where  absence  was  rare,  bias  for  joint-present 
information  (Cell  A  observations)  would  be  reversed  to  a  bias  for  joint-absent  information  (Cell 
D  observations).  To  test  this,  they  manipulated  the  rarity  of  an  observation  (joint  presence  or 
joint  absence)  by  adjusting  the  language  used  in  the  scenario.  Specifically,  in  one  of  the 
scenarios,  presence  was  rare  when  the  variables  were  “high  school  drop-out”  and  “mental 
illness”,  and  the  absence  was  rare  when  the  variables  were  “high  school  graduate”  and  “mental 
health.”  The  levels  of  the  variables  in  question  were  “presence”  and  “absence”  as  discussed 
above.  The  underlying  assumption,  of  course,  is  that  given  an  unspecified  population,  high 
school  graduates  are  more  common  than  high  school  drop  outs,  and  similarly  mental  health  is 
more  common  than  mental  illness  (of  course  arguments  could  be  made  against  this  assumption). 
McKenzie  and  Mikkelson  did,  in  fact,  find  support  for  a  reversal  such  that  the  results  supported  a 
Cell  A  bias  when  presence  was  rare  and  a  Cell  D  bias  when  absence  was  rare,  but  only  in 
conditions  where  the  variables  were  concrete  rather  than  abstract. 

Kelley  (2007)  presented  participants  with  correlationally  determinate  and  indeterminate 
hypothetical  data.  In  the  experimental  task,  participants  were  asked  to  review  the  sets  of 
hypothetical  data  and  rank  order  the  samples  with  respect  to  the  likelihood  that  the  causal 
candidate  produced  the  effect  in  the  sample.  The  probability  of  each  sample  being  drawn  from  a 
correlated  versus  an  uncorrelated  population  was  calculated.  In  most  conditions,  participants’ 
behavior  was  reflective  of  these  objective  probabilities.  Participants  showed  differential 
treatment  of  the  two  types  of  correlationally  indeterminate  data  samples  such  that  participants 
ranked  the  samples  in  which  the  causal  candidate  was  present  on  each  observation 
(indeterminate-present)  consistent  with  the  objective  probabilities  whereas  participants  did  not 
rank  the  samples  in  which  the  causal  candidate  was  absent  (indeterminate-absent)  on  each 
observation  as  such.  No  clear  data  pattern  emerged  for  treatment  of  the  indetenninate-absent 
samples.  Similarly,  in  an  additional  study  by  Kelley  et  al.  (2007),  the  results  showed  that 
participants  were  sensitive  to  the  ratio  of  observations  in  Cell  A  to  Cell  B  for  indeterminate- 
present  samples,  but  were  not  sensitive  to  the  ratio  of  observations  in  Cell  C  to  Cell  D  for 
indeterminate-absent  samples  (figure  2). 


3 


Cell  Frequency  Ratio 


Figure  2.  The  mean  relationship  ratings  as  a  function  of  the  A:B  or  C:D  ratio  of  cell 

frequencies  in  Kelley  et  al.  (2007).  A:B  pertains  to  the  indeterminate-present 
condition;  C:D  pertains  to  the  indetenninate-absent  condition.  Error  bars 
represent  the  standard  errors  of  the  means. 

Figure  3  displays  the  results  from  Kelley  (2007).  For  the  indetenninate-present  samples, 
participants’  rankings  reflected  the  objective  probabilities  that  were  previously  determined  in  a 
statistical  simulation.  The  objective  probabilities  represent  the  likelihood  that  each  particular 
sample  was  drawn  from  a  correlated  population  and  from  an  uncorrelated  population  (i.e.,  the 
probability  that  a  sample  of  six  Cell  A  observations  and  two  Cell  B  observations  would  be  drawn 
from  a  positively  correlated  population  versus  an  uncorrelated  population).  Specifically,  the 
probability  of  an  indetenninate-present  with  6  Cell  A  and  2  Cell  B  observations  (Ip  6:2)  drawn 
from  a  conelated  population  is  greater  than  that  drawn  from  an  unconelated  population. 
Participants  highly  ranked  Ip  6:2  samples  thus  indicating  that  the  sample  showed  strong  evidence 
of  a  relationship  between  x  and  y.  In  other  words,  for  indeterminate -present  samples,  when  the 
probability  that  the  sample  came  from  a  correlated  population  was  high,  participants  ranked  the 
sample  high  and  when  the  probability  that  the  sample  came  from  an  correlated  population  was 
relatively  low,  participants  ranked  the  sample  low.  This  finding  is  illustrated  in  figure  3. 
However,  also  shown  in  figure  3,  participants’  mean  ranks  for  the  indetenninate-absent  samples 
were  roughly  equal  for  both  types  of  indeterminate-absent  samples  despite  the  large  difference  in 
objective  probabilities  for  the  two  types.  Thus,  the  mean  ranks  for  indetenninate-absent  samples 
did  not  reflect  the  objective  probabilities. 


4 


Figure  3.  Data  from  Kelley  (2007).  Mean  rank  results  plotted  against  the  simulation 
results  (probabilities  of  each  indeterminate  sample  type  being  drawn  from 
correlated  and  uncorrelated  populations).  Sample  types  are  denoted  as 
indetenninate-present  with  a  Cell  A  to  B  ratio  of  6:2  (Ip6:2),  indeterminate- 
present  with  a  Cell  A  to  B  ratio  of  2:6  (Ip2:6),  indeterminate-absent  with  a 
Cell  C  to  D  ratio  of  6:2  (Ia6:2),  and  indeterminate-absent  with  a  Cell  C  to  D 
ratio  of  2:6  (Ia2:6). 

Kelley  (2007)  suggested  that  the  difference  in  results  between  indetenninate-present  and 
indeterminate-absent  samples  may  be  explained  by  the  previously  described  event  rarity  theory. 
In  Kelley’s  study,  the  indeterminate-present  samples  only  included  observations  where  the  causal 
candidate  was  present  and  the  indeterminate-absent  samples  only  included  observations  where 
the  causal  candidate  was  absent.  In  both  sample  types,  the  effect  varied.  Event  rarity  would 
predict  no  effect  of  cell  ratio  in  the  indeterminate-absent  samples.  Specifically,  given  that  the 
sample  lacks  rare  observations,  the  sample  may  not  be  informative  enough  for  the  decision 
maker  to  be  sensitive  to  the  probability  of  the  sample  being  drawn  from  a  correlated  versus 
uncorrelated  population. 


5 


Research  questions 


The  present  study  was  designed  to  further  evaluate  and  test  predictions  of  the  event  rarity 
hypothesis  in  varying  conditions  including  correlationally  indeterminate  samples.  The  main 
research  question  was  whether  participants  would  not  provide  biased  responses  when  the 
occurrence  of  a  variable  was  equally  likely  as  the  non-occurrence.  This  question  and  sub¬ 
questions  are  described  in  more  detail  below. 

Symmetric  variables 

Given  that  McKenzie  and  Mikkelson  (2007)  found  a  reversal  of  bias  from  Cell  A  to  Cell  D 
when  co-nonoccurrence  (joint  absence)  was  rare,  the  present  study  tested  whether  no  bias  would 
be  demonstrated  under  certain  conditions.  Specifically,  if  the  variable  levels  are  symmetric,  then 
one  level  of  the  variable  is  equally  as  likely  as  the  other  level,  thus,  the  event  rarity  theory  would 
predict  that  no  bias  be  demonstrated  for  either  Cell  A  or  D. 

Correlationally  indeterminate  data 

Kelley  et  al.  (2007)  have  previously  given  participants  samples  with  an  indetenninate 
correlational  relationship  and  asked  them  to  judge  the  causal  relationship  between  the  variables. 
Specifically,  two  types  of  indetenninate  samples  were  presented;  indeterminate-present  samples 
where  the  causal  variable  was  present  on  each  observation  in  the  sample  and  indeterminate- 
absent  where  the  causal  variable  was  absence  on  each  observation  in  the  sample.  Also,  the  ratio 
of  Cell  A  to  Cell  B  observations  (for  indeterminate-present  samples)  and  the  ratio  of  Cell  C  to 
Cell  D  observations  (for  indeterminate-absent  samples)  were  varied  in  these  studies.  The  authors 
found  that  there  was  a  strong  effect  of  cell  ratio  for  the  indeterminate-present  samples  and  little 
to  no  effect  in  the  indeterminate-absent  samples.  As  mentioned  above,  the  event  rarity  theory 
would  predict  the  null  effect  of  cell  ratio  because  indetenninate-absent  samples  may  not  provide 
enough  infonnation.  However,  previous  research  has  suggested  that  participants  struggle  with 
reasoning  about  “absent”  infonnation  (Wason  &  Johnson-Laird,  1972).  This  leads  to  the  second 
research  question  in  this  study  which  addressed  whether  the  null  effect  of  cell  ratio  in  the 
indetenninate-absent  samples  is  a  reflection  of  the  lack  of  rare  (informative  observations)  or 
difficulties  with  reasoning  about  “absent”  infonnation  on  the  part  of  the  participant.  The  event 
rarity  theory  would  predict  that  a  strong  data  pattern  would  emerge  when  the  indetenninate 
sample  contains  “rare”  observations  and  a  weak  or  null  effect  when  the  indetenninate  sample 
contains  “common”  observations.  Alternatively,  if  responses  are  a  consequence  of  difficulties 
processing  absent  information,  then  the  results  should  show  a  weak  or  null  effect  of  cell  ratio  in 
the  indeterminate-absent  samples. 

Event  rarity  manipulation 

Finally,  the  study  tested  whether  event  rarity  can  be  manipulated  without  changing  the 
variable  labels,  suggesting  that  McKenzie  and  Mikkelson’ s  (2007)  results  supporting  the  event 
rarity  hypothesis  could  be  explained  as  a  framing  effect  (ie.  systematic  change  in  responses 
resulted  from  the  positive/negative  frame  of  the  question).  In  the  present  study,  the  variable 


6 


labels  are  constant  across  conditions  while  the  cover  story  varies  with  respect  to  the  group  that 
the  data  describes.  The  three  groups  were  chosen  with  respect  to  stereotypes  and  beliefs  about 
those  groups. 


Research  objective 


The  objective  of  this  study  was  to  evaluate  the  role  of  event  rarity  in  the  perception  of 
correlationally  indeterminate  data. 


Methods 


General 

The  study  protocol  was  approved  in  advance  by  the  Bowling  Green  State  University  Human 
Subjects  Review  Board  (HSRB)  and  informed  consent  was  obtained.  The  study  attempted  to 
produce  a  reversal  of  bias  using  correlationally  indeterminate  samples  varying  two  levels  of  cell 
ratio  and  by  manipulating  event  rarity  with  respect  to  the  subpopulation  described  in  the 
instructions.  This  is  a  3  (group  membership)  X  2  (indeterminate  sample  type)  X  2  (cell  ratio) 
between-subjects  design  thus  yielding  12  conditions. 

Participants 

Participants  were  163  students  enrolled  in  an  Introduction  to  Psychology  course  at  a 
Midwestern  university.  They  did  not  receive  any  compensation  for  participation. 

Procedure 

Participants  completed  a  paper  and  pencil  covariation  assessment  task.  The  task  instructions 
stated  that  they  were  to  look  at  a  sample  of  data  gathered  in  a  hypothetical  study  of  gender  and 
personality  traits.  The  two  variables  in  the  data  were  gender,  the  levels  of  which  were  male  and 
female,  and  personality  trait,  the  levels  of  which  were  selfish  and  generous.  Since  the  levels  of 
the  variables  were  not  presence  and  absence,  the  contingency  table  was  labeled  such  that  Cell  A 
corresponded  to  observations  where  gender  was  female  and  trait  was  generous,  Cell  B  where 
gender  was  female  and  trait  was  selfish,  Cell  C  where  gender  was  male  and  trait  was  generous, 
and  Cell  D  where  gender  was  male  and  trait  was  selfish  (figure  4). 

After  reading  the  instructions  and  cover  story,  participants  saw  one  sample  of  eight 
observations  in  a  summary  fonnat.  An  example  of  the  task  is  included  in  the  appendix. 
Participants  saw  one  of  four  possible  sample  types:  indeterminate-AB  with  a  cell  ratio  of  6:2, 
indeterminate-AB  with  a  cell  ratio  of  2:6,  indetenninate-CD  with  a  cell  ratio  of  6:2,  or 
indeterminate-CD  with  a  cell  ratio  2:6.  In  indeterminate-AB  samples,  the  level  of  gender  was 
always  female  and  in  indeterminate-CD  samples,  the  level  of  gender  was  always  male.  A  cell 
ratio  of  6:2  indicates  six  Cell  A  and  two  Cell  B  (or  six  Cell  C  and  two  Cell  D)  observations  in  the 


7 


sample  whereas  a  cell  ratio  of  2:6  indicates  two  Cell  A  and  six  Cell  B  (or  two  Cell  C  and  six  Cell 
D).  After  viewing  the  sample,  participants  were  asked  to  rate  the  relationship  between  gender 
and  trait  on  a  scale  of  0  to  +10.  Finally,  participants  ranked  the  four  possible  observations  from 
1  to  4  with  1  being  the  most  informative  and  4  being  the  least  informative. 

Group  membership 

One  third  of  the  participants  were  told  that  the  hypothetical  study  surveyed  nurses,  which  is 
commonly  believed  to  be  a  job  held  by  more  women  than  men  and  implies  generosity,  thus  in 
this  condition  Cell  A  (generous,  female)  was  common  and  Cell  D  (selfish,  male)  was  rare.  One 
third  were  told  that  the  study  surveyed  politicians,  which  is  a  stereotypically  male  dominated 
field  and  has  negative  connotations  such  as  selfishness,  thus  in  this  condition  Cell  A  (generous, 
female)  was  rare  and  Cell  D  (selfish,  male)  was  common.  The  final  third  of  participants  were 
told  that  the  study  surveyed  hotel  managers,  which  is  not  stereotypically  gender  specific  or 
personality  trait  specific,  thus  in  this  condition  Cell  A  (generous,  female)  should  be  equally 
likely  as  Cell  D  (selfish,  male).  An  informal  survey  supported  these  stereotypical,  social  beliefs 
about  the  variables  in  question. 


Trait 

Generous  Selfish 


Gender 


Female 


Male 


A 

B 

C 

D 

Figure  4.  Contingency  table  describing  variables  used  in  study. 


Results 


Six  of  the  163  participants  were  excluded  from  the  analyses  because  they  did  not  follow  the 
instructions  properly  thus  leaving  157  participants  in  the  analyses.  Mean  ranks  were  calculated 
for  each  observation  type  and  displayed  in  the  Figure  5.  However,  given  the  dependent  nature  of 
the  ranking  task,  a  Chi-square  test  was  used  to  summarize  and  analyze  this  data.  The  proportion 


8 


of  times  that  Cell  A  and  the  proportion  of  times  that  Cell  D  were  ranked  as  the  most  informative 
were  calculated  for  each  condition  (see  table).  This  data  suggests  that  there  is  a  preference  for 
Cell  D  when  the  sample  is  indetenninate-CD  and  the  ratio  is  2:6  regardless  of  subpopulation 
type.  Alternatively,  Cell  A  is  ranked  as  the  most  informative  when  the  sample  is  indeterminate- 
AB  and  the  ratio  is  6:2.  The  rarity  manipulation  seemingly  did  not  influence  these  data  as 
predicted. 

4  1 - 

3.5  - 


3 


A  B  C  D 

Contingency  Table  Cells 

Figure  5.  Mean  ranks  for  each  contingency  table  cell.  The  cells  were  ranked  from  1 
(most  informative  observation)  to  4  (least  informative  observation). 


9 


Table 


Proportions  of  times  Cell  A  and  Cell  D  were  ranked  most  informative. 


Ratio  Type 

Ind.  Type 

Rarity  Type 

n 

Prop.  Cell  A 

Prop.  Cell  D 

6:2 

Ind-AB 

Cell  D  rare 

14 

0.857 

0.143 

2:6 

Ind-AB 

Cell  D  rare 

12 

0.167 

0.167 

6:2 

Ind-CD 

Cell  D  rare 

12 

0.417 

0.00 

2:6 

Ind-CD 

Cell  D  rare 

15 

0.52 

0.533 

6:2 

Ind-AB 

Cell  A  rare 

13 

0.846 

0.154 

2:6 

Ind-AB 

Cell  A  rare 

13 

0.385 

0.077 

6:2 

Ind-CD 

Cell  A  rare 

13 

0.308 

0.154 

2:6 

Ind-CD 

Cell  A  rare 

16 

0.25 

0.625 

6:2 

Ind-AB 

Equally  Likely 

14 

0.714 

0.071 

2:6 

Ind-AB 

Equally  Likely 

12 

0.167 

0.00 

6:2 

Ind-CD 

Equally  Likely 

11 

0.455 

0.00 

2:6 

Ind-CD 

Equally  Likely 

12 

0.25 

0.50 

Note.  Indetenninate  sample  types  are  denoted  Ind-AB  (indetenninate-AB)  and  Ind-CD 
(indeterminate-CD) . 


In  this  study,  participants  rated  the  relationship  between  two  dichotomous  variables  on  a 
scale  from  0  (no  relationship)  to  +10  (perfect  relationship)  after  viewing  a  summary  format 
sample  of  data.  To  analyze  the  data,  a  3  (group  membership)  X  2  (indetenninate  sample  type)  X 
2  (cell  ratio)  between-subjects  Analysis  of  Variance  (ANOVA)  was  used.  There  was  a  significant 
main  effect  of  cell  ratio  emerged  such  that  when  the  ratio  was  2:6  the  relationship  was  rated 
higher  than  when  the  ratio  was  6:2,  F  (1,  148)  =  4.05,/?  =  .046.  There  was  also  a  significant 
interaction  between  subpopulation  and  indeterminate  sample  type,  F  (2,  148)  =  5.3,/?  =  .006,  in 
that  when  Cell  A  was  the  rare  observation  (the  surveyed  group  was  politicians),  indetenninate- 
AB  samples  were  rated  higher  than  indetenninate-CD.  When  Cell  A  and  Cell  D  are  equally 
likely  (the  surveyed  group  was  hotel  managers),  then  indeterminate -AB  samples  were  rated 
lower  than  indeterminate-CD  samples.  When  Cell  D  was  the  rare  observation  (the  surveyed 
group  was  nurses),  indetenninate-AB  samples  were  similarly  rated  to  indeterminate-CD  samples. 
The  results  are  summarized  in  figures  6,  7,  and  8.  This  interaction  is  difficult  to  interpret, 
however,  suggests  that  participants’  ratings  may  also  reflect  the  degree  of  confidence  in  that 
rating.  Specifically,  when  Cell  A  was  rare  and  participants  saw  a  sample  with  Cell  A 
observations,  this  may  have  increased  their  confidence  in  the  rating  thus  rating  it  higher  than 
samples  that  did  not  contain  Cell  A  observations. 


10 


D) 

£ 


(0 

O' 

£ 

(0 

o 


Ind-AB 

Ind-CD 


Figure  6.  Mean  relationship  ratings  when  Cell  A  is  the  rare  observation.  Sample  types 
are  indeterminate-AB  and  indeterminate-CD,  denoted  Ind-AB  and  Ind-CD 
respectively.  Cell  ratios  were  2:6  and  6:2.  Bars  represent  standard  error  of 
the  mean. 


11 


Figure  7.  Mean  relationship  ratings  when  Cell  A  and  Cell  D  observations  are  equally 
likely.  Sample  types  are  indeterminate-AB  and  indeterminate-CD,  denoted 
Ind-AB  and  Ind-CD  respectively.  Cell  ratios  were  2:6  and  6:2.  Bars  represent 
standard  error  of  the  mean. 


Cell  frequency  ratio 


Figure  8.  Mean  relationship  ratings  when  Cell  D  is  the  rare  observation.  Sample  types 
are  indeterminate-AB  and  indeterminate-CD,  denoted  Ind-AB  and  Ind-CD 
respectively.  Cell  ratios  were  2:6  and  6:2.  Bars  represent  standard  error  of 
the  mean. 


12 


Discussion 


The  results  do  not  support  the  event  rarity  hypothesis,  however,  they  do  suggest  that  under 
some  conditions  behavior  may  be  consistent  with  confirming  hypothesis  testing  behavior. 
Klayman  and  Ha  (1987)  described  positive  test  strategy  as  testing  a  hypothesis  by  selecting  and 
attending  to  observations  where  the  effect  is  expected  or  has  occurred.  By  this  definition,  it 
could  be  argued  that  participants  in  this  study  were  giving  preferential  weight  to  observations 
that  confirmed  their  expectations.  For  example,  when  participants  were  told  that  the  hypothetical 
study  surveyed  nurses,  the  expectation  is  for  generous  females  rather  than  selfish  males.  In  other 
words,  the  “common”  observation  is  also  the  expected  observation  and  actual  observations  of 
generous  females  only  reaffirmed  those  expectations.  There  is  moderate  evidence  of  this  in  the 
results  such  that  when  Cell  A  observations  were  “common”  or  expected,  participants  rated 
samples  with  a  high  frequency  of  Cell  A  observations  (indeterminate-AB  with  a  cell  ratio  of  6:2) 
higher  (i.e.,  stronger  support  of  a  relationship)  than  samples  with  a  relatively  low  frequency  of 
Cell  A  observations  (indeterminate-AB  with  a  cell  ratio  of  2:6).  Likewise,  when  Cell  D 
observations  were  “common,”  participants  rated  samples  with  a  high  frequency  of  Cell  D 
observations  (indeterminate-CD  with  a  cell  ratio  of  2:6)  higher  than  those  with  a  relatively  low 
frequency  (indeterminate-CD  with  a  cell  ratio  of  6:2).  In  other  words,  participants  indicated  that 
the  “common”  observations  were  most  informative,  whereas  the  event  rarity  hypothesis  predicts 
just  the  opposite.  It  should  be  noted  that  these  effects  were  weak. 

The  current  study  presented  participants  with  symmetrical  variables  thus  eliminating  the  use 
of  the  variable  levels  “presence”  and  “absence.”  Given  this,  the  labeling  of  the  contingency  table 
is  arbitrary  (i.e.,  the  contingency  table  could  have  been  adjusted  so  that  Cell  A  observations  were 
selfish  males  and  Cell  D  observations  were  generous  females).  However,  a  large  proportion  of 
participants  still  ranked  Cell  A  observations  as  the  most  important  data,  even  when  the  sample 
did  not  contain  any  Cell  A  observations.  The  reason  for  this  is  unclear.  It  is  possible  however, 
that  this  could  also  be  explained  by  Klayman  and  Ha’s  (1987)  positive  test  strategy  which  states 
that  expected  events  and  outcomes  are  given  preferential  attention  in  hypothesis  testing.  In  this 
study  there  are  four  possible  observations;  generous  females,  selfish  females,  generous  males, 
and  selfish  males.  Despite  the  group  (nurses,  politicians,  hotel  managers)  presented  in  the 
instructions  or  cover  story,  participants  may  have  the  expectation  that  females  are  generous. 
Therefore,  this  observation  is  the  most  important  to  the  participant  despite  the  details  of  the 
group  involved  in  the  hypothetical  study  or  the  actual  sample  observed,  thus  resulting  in  Cell  A 
observations  being  ranked  as  the  most  infonnative  by  a  large  proportion  of  participants. 

The  results  of  the  current  study  seem  to  be  more  consistent  with  positive  test  strategy  than 
the  event  rarity  hypothesis.  One  reason  for  this  may  be  that  the  social  beliefs  and  stereotypes 
invoked  by  the  groups  may  not  have  been  sufficient  to  manipulate  event  rarity  given  that  the 
biases  demonstrated  do  not  reconcile  with  event  rarity  theory.  Rather,  the  stereotypes  may  have 
only  brought  forward  expectations  thus  yielding  results  consistent  with  positive  test  strategy. 

Another  reason  may  be  that  participants  were  unclear  about  how  to  proceed  with  the  task. 
Three  participants  reported  that  they  did  not  have  enough  infonnation  to  make  the  judgment  with 


13 


any  degree  of  confidence.  In  other  words,  these  participants  guessed  the  answer.  There  could 
have  been  a  number  of  other  participants  who  merely  guessed  the  answer  without  providing  any 
indication  of  such. 

One  final  reason  may  be  the  use  of  symmetrical  variables  rather  than  asymmetric  variables 
with  the  levels  of  “presence”  and  “absence.”  The  stereotypes  and  social  beliefs  concerning  the 
groups  may  not  have  elicited  the  artificial  asymmetry  as  intended.  If  so,  then  participants  may 
have  inferred  the  likelihood  of  each  observation  to  be  equal  thus  leading  them  to  use  an  alternate 
method  to  approach  the  task  (i.e.,  positive  test  strategy). 

Future  studies 

As  previously  discussed,  event  rarity  theory  predicts  the  previously  reported  data  patterns  in 
the  relationship  ratings  of  correlationally  indeterminate  samples.  In  a  future  study,  participants 
will  be  presented  with  indeterminate  samples  describing  a  causal  candidate  and  effect,  the  levels 
of  which  are  to  be  “presence”  and  “absence.”  The  labels  of  the  causal  candidate  and  effect  will 
vary  such  as  to  be  consistent  with  McKenzie  and  Mikkelson  (2007).  The  goal  of  this  future 
project  is  to  replicate  a  reversal  of  effects  from  indeterminate-present  to  indeterminate-absent 
samples  given  the  respective  likelihood  of  the  “presence”  and  “absence”  of  both  variables  in 
question.  If  the  results  of  the  future  study  do,  in  fact,  support  the  event  rarity  theory  then  that 
would  suggest  the  applicability  of  event  rarity  to  the  perception  of  correlationally  indeterminate 
samples.  This  would  further  support  the  conclusion  of  this  current  study  that  the  social  beliefs 
rarity  manipulation  was  insufficient  to  evoke  a  reversal  of  bias. 

Conclusions 


The  objective  of  the  present  study  was  to  evaluate  the  role  of  event  rarity  in  the 
perception  of  correlationally  indetenninate  data.  Given  the  results  of  previous  research  on 
correlationally  indeterminate  data,  it  was  predicted  that  participants’  responses  would  indicate 
“rare”  observations  to  be  the  most  informative  and  for  this  to  also  be  reflected  in  the  correlational 
relationship  ratings  of  the  samples  presented.  Participants  showed  a  bias  for  “common” 
observations  rather  than  “rare”  observations,  however,  thus  not  providing  support  for  the  event 
rarity  hypothesis.  As  previously  discussed,  some  of  the  results  are  consistent  with  positive  test 
strategy.  Further  research  is  ongoing  to  evaluate  the  event  rarity  hypothesis  and  positive  test 
strategy  in  relation  to  correlationally  indetenninate  data  and  which  factors  may  drive  differential 
behavior  (e.g.,  preference  for  “common”  observations  over  “rare”  ones  and  vice  versa). 


14 


References 


Anderson,  J.  R.  1990.  The  adaptive  character  of  thought.  Hillsdale  NJ:  Erlbaum. 

Crocker,  J.  1981.  Judgment  of  covariation  by  social  perceivers.  Psychological  Bulletin.  90:  272- 
292.  '  '  ~~  ~ 

Griffiths,  T.  L.,  and  Tenenbaum,  J.  B.  2005.  Structure  and  strength  in  causal  induction. 

Cognitive  Psychology.  5 1 :  334-384. 

Heuer  Jr.,  R.J.  1999.  Biases  in  perception  of  cause  and  effect.  In  Psychology  of  Intelligence 
Analysis  (chap.  11).  Retrieved  November  27,  2007,  from 
http://www.au.af.mil/au/awc/awcgate/psvch-intePartl4.html. 

Kao,  S.  F.,  and  Wasserman,  E.  A.  1993.  Assessment  of  an  information  integration  account  of 
contingency  judgment  with  examination  of  subjective  cell  importance  and  method  of 
information  processing.  Journal  of  Experimental  Psychology:  Learning.  Memory,  and 
Cognition.  19:  1363-1386. 

Kelley,  A.  M.  2007.  Bayesian  principles  and  causal  judgment.  Dissertation  Abstracts 
International:  Section  B:  The  Sciences  and  Engineering.  68  (5-B):  3418. 

Kelley,  A.  M.,  Anderson,  R.  B.,  and  Doherty,  M.  E.  2007.  Perception  of  correlation  in 
determinate  and  indeterminate  data. 


Klayman,  J,  and  Ha,  Y-W.  1987.  Confirmation,  disconfirmation,  and  information  in  hypothesis 
testing.  Psychological  Review.  94:  211-228. 

Levin,  I.  P.,  Wasserman,  E.  A.,  and  Kao,  S.  F.  1993.  Multiple  methods  for  examining  biased 
information  use  in  contingency  judgments.  Organizational  Behavior  and  Human  Decision 
Processes.  55:  228-250. 

McKenzie,  C.  R.  M.,  and  Mikkelsen,  L.  A.  2007.  A  Bayesian  view  of  covariation  assessment. 
Cognitive  Psychology.  54:  33-61. 

Wason,  P.  C.,  and  Johnson-Laird,  P.  N.  1972.  Psychology  of  reasoning:  Structure  and  content. 
Cambridge,  MA:  Harvard  University  Press. 


15 


Appendix 
Trial  sample 


Instructions: 

One  hundred  NURSES  were  asked  to  participate  in  an  experiment  studying  gender  and 
personality.  Each  participant’s  personality  was  detennined  by  a  survey. 

Below,  you  will  see  a  small  sample  of  the  experiment’s  data.  For  each  participant,  you  will 
be  told  whether  the  participant  is  MALE  or  FEMALE  and  whether  the  participant  is 
characterized  as  being  GENEROUS  or  SELFISH.  After  viewing  the  data,  please  answer  the 
following  questions. 


Participant  #: 

Gender: 

Personality 

Trait: 

6 

Female 

Generous 

57 

Female 

Selfish 

3 

Female 

Selfish 

8 

Female 

Generous 

21 

Female 

Generous 

23 

Female 

Generous 

80 

Female 

Generous 

22 

Female 

Generous 

16 


Questions: 

For  the  100  Nurses  who  participated,  how  strong  is  the  relationship  between  gender  and  whether 
their  personality  is  generous  or  selfish. 

Please  rate  the  relationship  between  0  and  +10  using  the  scale  below: 

0  NO  RELATIONSHIP 
+  10  STRONG  RELATIONSHIP 

Rating: _ 

Regardless  of  which  kinds  of  observations  are  shown  in  the  data  table  (above),  rank  the 
following  kinds  of  observations  in  terms  of  how  strongly  they  would  support  evidence  of  a 
relationship  (Place  a  letter  in  each  blank): 

(A)  Female  and  Generous  (B)  Female  and  Selfish 
(C)  Male  and  Generous  (D)  Male  and  Selfish 

#1  (strongest  support) _ 

#2 _ 

#3 _ 

#4  (weakest  support) _ 


17 


DEPARTMENT  OF  THE  ARMY 

U.S.  Army  Aeromedical 
Research  Laboratory 
Fort  Rucker,  Alabama  36362-0577 


